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Abstract — This is a literature review that aimed to find 
articles that exemplify and describe the use of 
multivariate analysis in different fields of Forest 
Agricultural Sciences, considering effective practices 
using multivariate statistical techniques for the 
simultaneous processing of data. For data collection were 
selected for the meta-analysis of 70 technical articles of 
which 54 were employed in the study directed to the use of 
multivariate techniques applied in the areas of 
agricultural sciences. The results showed thatstudies 
directed to certain areas within the Forest Agricultural 
Sciences exhibit some regularity in the use of multivariate 
analysis, and most application analyzes were more usual 
as the Cluster Analysis (AA) and Principal Component 
Analysis ( PCA). Thus the use of multivariate analysis 
studies and evaluations of experiments in Agricultural 
Sciences proved to great value to allow greater clarity 
and better interpretation of dealing with complex 
phenomena. 

Keywords — Multivariate Analysis, Multivariate 
Methods, Forest Agricultural Sciences. 

I. INTRODUCTION 

Statistically data analysis is classified into 
univariate or multivariate, i.e., it variables alone or jointly 
respectively. According VICINI, 2005 until the advent of 
computers the data were treated only in isolation, and 
when a phenomenon depends on many variables such 
analysis became unfeasible. 

Multivariate analysis corresponds to a large 
number of methods and techniques that utilize, 
simultaneously, all variables in the theoretical 
interpretation of the set of obtained data (Neto, 


2004).According to Hair et al., (2009) multivariate 
techniques are popular because they allow organizations 
to create knowledge, thereby improving their decision¬ 
making. Multivariate analysis refers to all the statistical 
techniques that simultaneously analyze multiple 
measurements on individuals or objects under 
investigation. 

For Gerhardt, et al., 2001 multivariate analysis 
comes to data through a set of statistical techniques 
considering measures many variables simultaneously. 
And to obtain such results some multivariate methods are 
applied to data depending on the research objectives,since 
it is known that an exploratory data analysis, aims to 
generate hypotheses that is exactly the goal of the 
multivariate analysis (VICINI, 2005). 

Multivariate analysis is a vast field in which 
even experienced statistical move carefully, because this 
is a new area of science, much is yet to be discovered. 
The art of the use of multivariate analysis is the choice of 
the most appropriate options to detect the standards 
expected in the data (MAGNUSSON, 2003). 

The purpose of their application may be to 
reduce data or stmctural simplification, sort and group, to 
investigate the dependency between variables, prediction 
and develop hypotheses and test them (IOHNSON; 
WICHERN, 1992). 

Multivariate techniques can meet the specific 
interests of a forestry company or a research institution, 
aiming at a particular interest, apart from a property or set 
of properties. Thus, this study aims to quantify and clarify 
what and how the main tools of multivariate analysis 
applied in various areas of study of forest agricultural 
sciences are used, reviewing a number of literature 
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articles updated where various techniques of multivariate 
analysis are used. 


II. REVIEW 

The application of multivariate analysis is a 
combination of multiple information entered in the 
experimental unit, so that the selection is based on a 
complete set of important variables that discriminate 
between materials that are more promising (Maeda et al., 
2001). Since the multivariate techniques have numerous 
applications, one needs to know about the main of them 
being applied in the areas of Forest Agricultural Sciences, 
its functions and objectives. We as the main examples of 
multivariate techniques, multivariate normal distribution, 
matrix and vectors, quadratic forms, eigenvalues and 
eigenvectors, analysis of multivariate variance 
MANUVA, the multivariate linear regression models, 
simultaneous tests on several variables, multivariate 
distances, component analysis , factors analysis, cluster 
analysis and discriminant, canonical correlation analysis. 
Factor analysis 

Factor Analysis (FA) aim to reduce the number 
of initial analysis with the least possible waste of 
information, taking advantage of a set statistical 
techniques (VICINI, 2005). CARVALHO, 2013 says that 
whenever there is a strong correlation with variables is 
conceivable to group them into a group, since different 
variable groups have weak correlation. 

Factor analysis is applied when there is a large 
number of variables and correlated, includes principal 
component analysis and analysis of common factors, in 
order to identify a smaller number of new alternatives 
variables, uncorrelated and that somehow , summarize the 
main information of the original variables finding factors 
or latent variables (Mingoti, 2005). 

According to Carvalho, 2013 generic formula for 
applying a factor analysis is defined by: 

X - = p + AF e (1) 

Whereas X = [XI X2. . . Xp] T as a real random 
vector of dimension P, with mean vector p = [pi p2. . . 
pp] T and covariance matrix Z variance-defined positive. 
The model of factorial analysis each observable variable 
Xi expressed as a linear function of m random variables 
FI, F2,. . . , Fm (m <p), called common factors, and one 
factor or error, £i, i = 1, 2,. . . , P. Which it is also a 
random variable that explains the part of the respective 
variable variance not explained by common factors. 
Already A would be the matrix (PXM), the common 
factors m and p only factors are not observable. 

Method Kaiser-Meyer-Olkin (KMO) 

Using factor analysis there is an adequacy of 
data that is very important proposal by Kaiser-Meyer- 
Olkin (KMO). The KMO test is based on the principle 


that the inverse correlation matrix approaches the 
diagonal matrix, therefore compares the correlations 
between observed variables Solomon et al., (2012). 

According VICINI, 2005 KMO can be obtained 
by the following equation: 

XX'V 

i / 


i J i j 

( 2 ) 

The ratio of the sum of the squares of the 
correlations of all variables is divided by itself, plus the 
sum of the squares of the partial correlations of all 
variables. 

At where: 

= r ij is the correlation coefficient between the observed 
variables i andj. 

= ij is the partial correlation coefficient between the same 
variables. The aij should be close to zero, because the 
factors are orthogonal to each other. 

So that the data can fit the factor analysis should 
be noted the following regarding the value found in 
Kaiser's equation: 

Table. 1: Relationship between the KMO and the use of 


Factor Analysis 


KMO 

Recommendation AF 

>0.9 

Great 

> 0.8 and <0.9 

Good 

> 0.7 and <0.8 

Average 

> 0.6 and <0.7 

Acceptable 

> 0.5 and <0.6 

weak 

<0.5 

Unacceptable 


Source: CARVALHO, 2013 


Sphericity test Bartlett 

Another test used widely in the factor analysis is 
to Bartlett sphericity test (BTS), which tests the following 
hypothesis: the correlation matrix is an identity matrix, ie 
the values of the main diagonal are equal to 1 and the 
other figures be zero, concluding that its determinant is 
equal to 1. This means that the variables have no 
correlation and the null hypothesis can be rejected if the 
adopted a is equal to 0.05 or 5% and the value found is 
less than the value of a. (Pereira, 2001). 

Bartlett's test evaluates the overall significance of 
the correlation matrix, i.e. tests the null hypothesis that 
the correlation matrix is an identity matrix Solomon et al., 
( 2012 ). 

Principal Component Analysis 

The goal of the principal component analysis 
(PCA) is to address issues such as the generation, 
selection and interpretation of the investigated 
components. Intending thereby determine the most 



KMO = 


XX 
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influential variables in the formation of each component 
(VICINI, 2005). 

According to Castro et al., 2013 by the ACP, a 
random vector can be explained by the variance and 
covariance structure (composed of random variables p)by 
constructing linear combinations of the original variables. 

Through data covariance matrix becomes a 
major component estimated. For the application of 
analysis it is necessary to standardize the data so that the 
whole series will have the same magnitude of values. 
After obtain the eigenvectors that are values representing 
the weights of each component in each variable and range 
of (-1 to 1) and function as correlation coefficients that 
represents the contribution of each component to explain 
the total variation of the dataRuhoff et al., (2009). 

Clusters analysis 

The Cluster Analysis or Cluster (AA) in 
multivariate data identifies groups of objects. The goal is 


to form groups with homogeneous properties of large 
heterogeneous samples. Should be sought more 
homogeneous groups possible and that the differences 
between them are as large as possible (Hair et al., 2005). 

The A A encompasses a variety of techniques and 
algorithms, and the goal is to find and separate similar 
data in the same group and are distinct from the data of 
the other groups (VICINI, 2005). 

According Ruhoff et al., 2009 AA seeks to group 
data elements that are more like each other. The groups 
are determined so as to obtain homogeneity between the 
elements of the groups and heterogeneity between them 
Dendrogram 

As a result of AA we get the dendrogram or phenograms 
also known as graphic tree that is graphic with a summary 
of the groups obtained by the analysis. 


S 

u 

•e 
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I 



Euclidian distance 


Fig.l: Grouping according to the quality of wood for the production of charcoal, obtained by simple connection method to 

use the Euclidean distance. 

Source: CASTRO et al, 2013. 

It is observed in Figure 1 that the genetic to form the first group. So then come variables 10:09, and 

material of 11:08 have the greatest similarity dendrogram after 1 and 5, and so on, the variables are grouped in 

by having the smallest Euclidean distance being such as descending similarity order, ie 12 variable formed the last 
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group of the dendrogram, which remained If different 
from the other groups formed, because this variable has 
little resemblance to the others. 

Distance Euclidean 

In Cluster Analysis some distance measurement 
coefficients are important, and among them is the 
Euclidean distance also known as dissimilarity measure. 
According PARENTS; SILVA; Ferreira (2012) 
considering two points A and B, the Euclidean distance 
can be calculated with the following formula: 

Distance between A and B = DAB = v/Zpj = 1 
(xja - XJB) * (3) 

In matrix form, this distance is given by: 

DAB = V (xa - xb) '. (Xa - xb) (4) 

Mahalanobis distance 

The similarity between samples (treatment, 
individuals, populations) correlated to a set of 
characteristics and the distance between any pairs of 
sampling units, the degree of dependence between 
variables must be considered. To quantify distance 
between two populations when there is data repetition, it 
is recommended to use the Mahalanobis distance (d 2 ) 
(VICINI, 2005). 

Canonical Correlation Analysis 

The Canonical Correlation Analysis (CCA) has as 
its main objective the study of existing linear relationship 
between two sets of variables. Applying this analysis 
summarizes the information of each response variables set 
in linear combinations seeking to maximize the 
correlation between the two sets (Mingoti, 2005).The 
ACC is a type of statistical technique in the multivariate 
analysis which aims according to Protasio et al., 2012 to 
check associations between groups with different 
characteristics. 

This multivariate analysis model allows to 
discover the relationship between two groups or sets of 
variables, increasing the correlation between the vectors 
of independent and dependent variables Burt, (2015). 
Multiple Regression Analysis 

Multiple regression provides the changes in the 
dependent variable in accordance with changes in the 
independent variables. The method is suitable when there 
is a single analysis dependent variable metric related to 
two or more independent variables (Hair et al., 2005). 

Discriminant analysis 

The multiple discriminant analysis consists of a set 
of tools and methods used to distinguish populations 
groups and classifying new observations in certain groups 
and used when groups are known a priori (Mingoti, 
2005). 

MANOVA /MANCOVA 


The multivariate analysis of variance and 
covariance is also known as MANOVA (multivariate 
analysis of variance) and MANCOVA (multivariate 
analysis of covariance), aim to verify the similarity 
between multivariate groups simultaneously exploring the 
relationship between several independent variables and 
two or more variables dependent metrics (Hair et al, 
2005). 

III. RESULTS AND DISCUSSIONS 

In this study we selected 54 subjects who treat 
articles that are inserted in the area of Agricultural 
Sciences Forest with the application of multiple 
multivariate statistical methods. The articles selected were 
published between 1990 and 2018 and in this range in 
2015 has been the year with most publications, 8 in total, 
followed by 2003 and 2012 with 6 each publications. In 
contrast the years 1990, 2002, 2005, 2011, 2016 and 2017 
contributed one article. 

Dealing with multivariate analyzes, among the 
most used in the selected works we can mention among 
the most important the grouping or cluster analysis used 
25 times, followed by Component Analysis Principal 20 
times, the factor analysis 10 times, a Canonical 
Correspondence Analysis which was used 9 times and 8 
times Discriminant Analysis. 

Since the case of the multivariate analysis used 
in each study, we observed a pattern between the 
multivariate method used and certain lines of research in 
the area has been established. Knowing this we sought to 
verify this pattern lines separating the search by subject 
and quantifying which types of multivariate method used 
was more. 

Multivariate analysis ins studies involving 
managements soil 

Of the 11 works found in this area can be seen in 
studies Freitas et al. (2015b) and Mantovanelh et 
al.(2015) using the same multivariate Cluster Analysis 
(AA), Principal Component Analysis (PCA) and 
MANOVA the applicant ACP and the most jobs found in 
this area as it is noted in studies SILVA; et al., (2010a) 
SILVA; et al., (2010b), Oliveira; et al., (2015) (JORDAN, 
2018) SILVA et al., (2009), BARRETO et al., (2006). hi 
addition to the AA already applicant, it was also used 
multivariate techniques such as discriminant analysis 
(DA), Canonical Correspondence Analysis (CCA) and 
factor analysis as noted in the articles ofGerhardt; et al. 
(2001), Baretta;Baretta; Cardoso(2008) and BENTTES et 
al., (2010). 

Multivariate Analysis in environmental studies 

This area of study other 12 works were selected 
of which can be seen the use of factor analysis the most 
recurrent among multivariate methods as noted in the 
study Scatena, (2005), Campos et al., (2015), Cunha et 
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al., (2008) Parents; Silva; Ferreira (2012), Pinto; Col., 
(2014), Silva; Feather; Souza (2015). Other analyzes 
Multivariate as Regression AnalysisCalijuri et al. (2009), 
Discriminant Analysis Braga et al. (2009), Clustered 
SIL\A Analysis, (2003) were also used to a lesser extent 
in this area of study. 

Articles of BERTOSSI, (2013), BERTOSSI et 
al., (2013), HUGO et al., (2012), applying multivariate 
analysis data indicators of water quality, it is noted that all 
studies Valley ACP main multivariate analysis to analyze 
the data. 

Technical analysis of multi\ariate in experiments 
involving forests and Forest products of origin 

In this part of 31 studies not fit this line of 
research which were separated in sub-items for better 
visualization of multivariate analysis applied in the 
aforementioned area of study. 

Considering the application of multivariate 
analysis of floristic data analysis 7 items fall into this 
issue where there was no standard in the use of 
multivariate methods getting use well distributed in this 
type of study, it was noted that HIGUGHI et al., (2012 ) 
and Higuchi et al., (2013) took advantage of the ACP and 
the AA to analyze your data and dealing with similar 
themes applied in different areas in areas of Santa 
Catarina methods have adapted perfectly to the proposed 
studies. In PEIXOTO work (2004) in Rio de Janeiro and 
Narvaes; LONGHI; BRENA, (2008) in Rio Grande do 
Sul, also because it is similar studies in different areas of 
the same multivariate technique can be applied to both 
studies and AA managed to separate the similar data in 
different groups where using the dendrogram and 
Euclidean distance data were easily spotted. Still treating 
the flora analysis Souza et al., (2003a) andBERTANI 
(2001) took advantage of the ACC to analyze the floristic 
diversity in riparian forests. Lastly SOLOMON; JUNIOR; 
SANTANA, (2012) used the factor analysis to carry out 
the floristic analysis of primary forest for restoration of a 
mined area. 

In studies dealing with the quality of wood for 
energy purposes, each author made use of different 
multivariate statistics for the analysis of data as can be 
seen in Protasio et al., (2012) with only the ACC could 
verify the associations between the group formed by the 
characteristics of Eucalyptus clones with the 
characteristics of the group formed by her charcoal 
obtained. Already Castro et al., (2013) used three 
multivariate analysis they being the ACC ACP and AA 
that through them it can be concluded that the properties 
of charcoal are strongly correlated to the wood, especially 
the apparent density of charcoal and the gravimetric yield. 
GADELHA et al. (2015) has focused his study of this 
same area MANO\A multivariate method used to 
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evaluate which clones of eucalyptus is the production for 
energy purposes. 

hi plant stratification 3 selected studies make use 
of the same methodology for the processing A A and AD 
as can be seen in the articles of SOUZA et al., (2003b), 
Souza et al. (2006) and Souza et al. (2012) and 
multivariate classification of the forest classes of 
volumetric stocks proved to be an efficient method for 
laminating homogeneous areas in the three types of forest, 
which can be constituted by extracts, compartments, site 
classes and annual production units . 

Reforestation of mined areas CUNHA et al. 
(2003) and / or degraded LOSCFU et al., (2011) there was 
a similarity in the use of multivariate ACC that despite 
distinct areas there was a similarity in the results 
presented by the same analysis. Oliveira et al. (2016) 
made use of the ACP which showed efficient use of 
multivariate analysis in response to a high variance data, 
is used as a tool for annual use, may best reference 
ecological standards of the area, can be used to identify 
indicators of forest restoration. 

hi studies aimed at planting was unanimous the 
use of AA as noted in Article developed by Grigolo et al. 
(2018). In four studies selected 3 of them also used the 
ACP to complement the study as noted in studies of 
NETO et al., (2018) andRuhoff et al. (2009). The use of 
principal components showed that higher yields are 
correlated with proper growth of the shoot, in conditions 
of lower bulk density, providing high dry matter 
production of roots (FREDDI; FERRAUDO; 
CENTURION, 2008). 

hi studies involving forests of the 11 selected 
articles met similarity in the use of multivariate analysis 
in studies Rovedder et al., (2014) and LUCIO et al. 
(2006)They used the ACP to reduce the maximum 
number of variables that could represent possible and 
most of the variance found. However, studies Almeida et 
al. (2015) and CANUTO et al., (2015) used the AA to 
separate the samples that have greater similarity in 
different groups and thus can make a better analysis of the 
sampled data, hi the study by MACHADO (2004) 
Rectified Correspondence Analysis,LONGHIL et al., 
(2009) Regression Analysis, Silva et al., (2012) Factor 
Analysis, MARTINS; SAUCER; OLIVEIRA, (2002) and 
M A NOVA Canonical Correspondence Analysis (ACC), 
TRUGILHO; UME; MORI (2003) ACC Oliveira et al. 
(2017) Discriminant analysis (DA), Souza et al. (1990) 
AD and Cluster Analysis, We can not show a pattern in 
the use of multivariate analysis where each author made 
use of a different method to analyze your data. 

IV. CONCLUSION 

Presented results it is seen that studies directed to 
certain areas within the Forest Agricultural Sciences have 
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certain regularity in the use of multivariate analysis, 
making use of the same techniques to observe its data. 
And that because of the usemultivariate analysis deduce 
from some knowledge, very complex methods are rarely 
used in the searched items in exchange for simpler 
analysis that were most useful as the Cluster Analysis 
(AA) and Principal Components Analysis (PCA) which 
were the most widely used . But the determining factor in 
the choice of multivariate analysis applied is the purpose 
of the analysis, which generally applies in the 
simultaneous analysis of multiple sets factors when it 
needs to reduce data, identify relationships between 
variables, split group of similar factors among others. 

The use of multivariate analysis studies and 
evaluations of experiments in Agricultural Sciences 
proved to great value to allow greater clarity and better 
interpretability of dealing with complex phenomena. 
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